skip to main content


Search for: All records

Creators/Authors contains: "Schmitt, Paul"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Recently, much attention has been devoted to the development of generative network traces and their potential use in supplementing real-world data for a variety of data-driven networking tasks. Yet, the utility of existing synthetic traffic approaches are limited by their low fidelity: low feature granularity, insufficient adherence to task constraints, and subpar class coverage. As effective network tasks are increasingly reliant on raw packet captures, we advocate for a paradigm shift from coarse-grained to fine-grained traffic generation compliant to constraints. We explore this path employing controllable diffusion-based methods. Our preliminary results suggest its effectiveness in generating realistic and fine-grained network traces that mirror the complexity and variety of real network traffic required for accurate service recognition. We further outline the challenges and opportunities of this approach, and discuss a research agenda towards text-to-traffic synthesis. 
    more » « less
    Free, publicly-accessible full text available November 28, 2024
  2. Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in general, but is of particular importance in networking's highly dynamic deployment environments. In this paper, we first characterize concept drift in a large cellular network for a major metropolitan area in the United States. We find that concept drift occurs across many important key performance indicators (KPIs), independently of the model, training set size, and time interval---thus necessitating practical approaches to detect, explain, and mitigate it. We then show that frequent model retraining with newly available data is not sufficient to mitigate concept drift, and can even degrade model accuracy further. Finally, we develop a new methodology for concept drift mitigation, Local Error Approximation of Features (LEAF). LEAF works by detecting drift; explaining the features and time intervals that contribute the most to drift; and mitigates it using forgetting and over-sampling. We evaluate LEAF against industry-standard mitigation approaches (notably, periodic retraining) with more than four years of cellular KPI data. Our initial tests with a major cellular provider in the US show that LEAF consistently outperforms periodic and triggered retraining on complex, real-world data while reducing costly retraining operations.

     
    more » « less
    Free, publicly-accessible full text available September 28, 2024
  3. Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy. 
    more » « less
  4. null (Ed.)
    The digital divide—and, in particular, the homework gap— have been exacerbated by the COVID-19 pandemic, laying bare not only the inequities in broadband Internet access but also how these inequities ultimately affect citizens’ ability to learn, work, and play. Addressing these inequities ultimately requires having holistic, “full stack” data on the nature of the gaps in infrastructure and uptake—from the physical infrastructure (e.g., fiber, cable) to speed and application performance to affordability and neighborhood effects that ultimately affect whether a technology is adopted. This paper surveys how various existing datasets can (and cannot) shed light on these gaps, the limitations of these datasets, what we know from existing data about how the Internet responded to shifts in traffic during COVID-19, and—importantly for the future—what data we need to better understand these problems moving forward and how the research community, policymakers, and the public might gain access to various data. Keywords: digital divide,iInternet, mapping, performance 
    more » « less
  5. null (Ed.)
    During the early weeks and months of the COVID-19 pandemic, significant changes to Internet usage occurred as a result of a sudden global shift to people working, studying and quarantining at home. One aspect that this affected was interconnection between networks, which this paper studies. This paper explores some of the effects of these changes on Internet interconnection points, in terms of utilization, traffic ratios, and other performance characteristics such as latency. 
    more » « less
  6. null (Ed.)
  7. null (Ed.)
    Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (\ie, startup delay and resolution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the models work in deployment settings where the video sessions and segments must be identified from a mix of traffic and the time precision of the collected traffic statistics is more coarse (\eg, due to aggregation). Second, we develop a single composite model that works for a range of different services (\ie, Netflix, YouTube, Amazon, and Twitch), as opposed to just a single service. Third, unlike many previous models, our models perform predictions at finer granularity (\eg, the precise startup delay instead of just detecting short versus long delays) allowing to draw better conclusions on the ongoing streaming quality. Fourth, we demonstrate the models are practical through a 16-month deployment in 66 homes and provide new insights about the relationships between Internet "speed'' and the quality of the corresponding video streams, for a variety of services; we find that higher speeds provide only minimal improvements to startup delay and resolution. 
    more » « less